21 research outputs found

    XML document design via GN-DTD

    Get PDF
    Designing a well-structured XML document is important for the sake of readability and maintainability. More importantly, this will avoid data redundancies and update anomalies when maintaining a large quantity of XML based documents. In this paper, we propose a method to improve XML structural design by adopting graphical notations for Document Type Definitions (GN-DTD), which is used to describe the structure of an XML document at the schema level. Multiples levels of normal forms for GN-DTD are proposed on the basis of conceptual model approaches and theories of normalization. The normalization rules are applied to transform a poorly designed XML document into a well-designed based on normalized GN-DTD, which is illustrated through examples

    XML documents schema design

    Get PDF
    The eXtensible Markup Language (XML) is fast emerging as the dominant standard for storing, describing and interchanging data among various systems and databases on the intemet. It offers schema such as Document Type Definition (DTD) or XML Schema Definition (XSD) for defining the syntax and structure of XML documents. To enable efficient usage of XML documents in any application in large scale electronic environment, it is necessary to avoid data redundancies and update anomalies. Redundancy and anomalies in XML documents can lead not only to higher data storage cost but also to increased costs for data transfer and data manipulation.To overcome this problem, this thesis proposes to establish a formal framework of XML document schema design. To achieve this aim, we propose a method to improve and simplify XML schema design by incorporating a conceptual model of the DTD with a theory of database normalization. A conceptual diagram, Graph-Document Type Definition (G-DTD) is proposed to describe the structure of XML documents at the schema level. For G- DTD itself, we define a structure which incorporates attributes, simple elements, complex elements, and relationship types among them. Furthermore, semantic constraints are also precisely defined in order to capture semantic meanings among the defined XML objects.In addition, to provide a guideline to a well-designed schema for XML documents, we propose a set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al. The corresponding normalization rules to transform from a G- DTD into a normal form schema are also discussed. A case study is given to illustrate the applicability of the concept. As a result, we found that the new normal forms are more concise and practical, in particular as they allow the user to find an 'optimal' structure of XML elements/attributes at the schema level. To prove that our approach is applicable for the database designer, we develop a prototype of XML document schema design using a Z formal specification language. Finally, using the same case study, this formal specification is tested to check for correctness and consistency of the specification. Thus, this gives a confidence that our prototype can be implemented successfully to generate an automatic XML schema design

    Extraction Of Defensin Antimicrobial Peptide From SWISS-PROT Database Using Extended Boyer-More Algorithm.

    Get PDF
    Antimicrobial Peptide (AMP) is a subset of protein that plays an essential role in innate immunity system. Researches on AMP are actively conducted in Immunology field where synthetics antibiotics are being developed. There are several classifications of AMP's families with different mechanism in immobilizing pathogens. Thus,-family classification could help speed up a search for specific family AMP

    iProt – A Data Warehouse For Protein Database.

    Get PDF
    An integrated database is a vital approach to help biologist to analyze protein data from heterogeneous formats and resources. The iProt (Integrated Protein Data Warehouse) is a pool of protein dataset that provide a comprehensive protein sequence' 3D structure, enzymatic reaction, gene description, and taxonomical data from five different protein databases; Swiss-Prot, Protein Data Bank (PDB), ENZYME, NCBI Taxonomy and Gene Ontology (GO)

    System Development: What, Why, When And How CASE Tools Should Support Novice Software Engineers.

    Get PDF
    Novice software engineers particularly computer science students need to be trained with theoretical knowledge and practical skills in system developments. The knowledge and skills may encompass the activities involve in all phases of system development including analysis, design, coding, testing and maintenance

    A Study of Customer Behaviour Through Web Mining

    Get PDF
    Web mining is the extraction of interesting and potentially useful patterns and hidden information from web documents and web activities by applying data mining technology. The most important challenge of electronic commerce (E-commerce) is to understand as much as possible the customers ’ wants, desires, and buying patterns to ensure competitiveness in the E-commerce era. Nowadays, any information related to consumer behavior has an important value in the highly competitive nature of the E-commerce market. Therefore, web mining can be used to find those obvious data that have potential value to reduce competition and simultaneously increase business profit. This paper aims to study the classification of web mining to extract customer behavior in E-commerce, investigate customer behavior through the techniques and processes of the web data mining used, explore the application of web mining in E-commerce, and increase profit

    A System To Integrate And Manipulate Protein Database Using BioPerl And XML.

    Get PDF
    The size, complexity and number of databases used for protein information have caused bioinformatics to lag behind in adapting to the need to handle this distributed information. Integrating all the information from different databases into one database is a challenging problem

    The Improved Hybrid Algorithm for the Atheer and Berry-Ravindran Algorithms

    Get PDF
    Exact String matching considers is one of the important ways in solving the basic problems in computer science. This research proposed a hybrid exact string matching algorithm called E-Atheer. This algorithm depended on good features; searching and shifting techniques in the Atheer and Berry-Ravindran algorithms, respectively. The proposed algorithm showed better performance in number of attempts and character comparisons compared to the original and recent and standard algorithms. E-Atheer algorithm used several types of databases, which are DNA, Protein, XML, Pitch, English, and Source. The best performancein the number of attempts is when the algorithm is executed using the pitch dataset. The worst performance is when it is used with DNA dataset. The best and worst databases in the number of character comparisons with the E-Atheer algorithm are the Source and DNA databases, respectively

    Application Of Exact String Matching Algorithms Towards SMILES Representation Of Chemical Structure.

    Get PDF
    Bioinformatics and Cheminformatics use computer as disciplines providing tools for acquisition, storage, processing, analysis, integrate data and for the development of potential applications of biological and chemical data. A chemical database is one of the databases that exclusively designed to store chemical information

    Hybrid ensemble model with optimal weightage for suicidal behavior prediction

    Get PDF
    Suicidal behavior is a complex phenomenon that is contextually dependent and changes rapidly from one day to another. The problem in predicting suicidal behavior is identifying individuals and at-risk groups in crisis and at risk for suicide. The current predictive model, which uses machine learning techniques, has been shown to lack accuracy, and no study has attempted to use a voting ensemble model to predict suicidal behavior. The soft voting ensemble model demonstrated good performance in the healthcare setting, but assigning optimal weights for machine learning models is challenging. Therefore, this paper aims to propose a hybrid voting ensemble model to achieve optimal weights in predicting an individual with suicidal behavior. The results show that the proposed hybrid voting ensemble model can effectively classify an individual with suicidal behavior with an accuracy of 0.84 compared to other machine learning models (logistic regression, support vector machine, random forest, gradient boosting). Hybridization of soft voting with brute force algorithm has shown that the proposed hybrid ensemble model can find the optimal weights for the machine learning model in the context of predicting suicidal behavior. Furthermore, the proposed hybrid ensemble model shows that clinical data can be used to improve the performance of machine learning models in predicting an individual with suicidal behavior
    corecore